Picture for Alessandro Rinaldo

Alessandro Rinaldo

Rethinking Multinomial Logistic Mixture of Experts with Sigmoid Gating Function

Add code
Feb 01, 2026
Viaarxiv icon

A Statistical Theory of Gated Attention through the Lens of Hierarchical Mixture of Experts

Add code
Feb 01, 2026
Viaarxiv icon

Low-Dimensional Adaptation of Rectified Flow: A New Perspective through the Lens of Diffusion and Stochastic Localization

Add code
Jan 21, 2026
Viaarxiv icon

On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts

Add code
May 24, 2025
Figure 1 for On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts
Figure 2 for On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts
Figure 3 for On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts
Viaarxiv icon

On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating

Add code
May 16, 2025
Viaarxiv icon

Convergence Rates for Softmax Gating Mixture of Experts

Add code
Mar 05, 2025
Figure 1 for Convergence Rates for Softmax Gating Mixture of Experts
Figure 2 for Convergence Rates for Softmax Gating Mixture of Experts
Figure 3 for Convergence Rates for Softmax Gating Mixture of Experts
Figure 4 for Convergence Rates for Softmax Gating Mixture of Experts
Viaarxiv icon

Uncertainty quantification for Markov chains with application to temporal difference learning

Add code
Feb 19, 2025
Viaarxiv icon

Statistical Inference for Temporal Difference Learning with Linear Function Approximation

Add code
Oct 21, 2024
Viaarxiv icon

Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence

Add code
Oct 19, 2024
Figure 1 for Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence
Figure 2 for Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence
Figure 3 for Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence
Figure 4 for Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence
Viaarxiv icon

Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts

Add code
May 22, 2024
Figure 1 for Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
Figure 2 for Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
Figure 3 for Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
Figure 4 for Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
Viaarxiv icon